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^H Summary 

^ We introduce a covariate-specific total variation penalty in two semiparametric niod- 

f-H els for the rate function of recurrent event process. The two models are a stratified Cox 

C/^ model, introduced in Prentice et al. (1981), and a stratified Aalen's additive model. We 

r~| show the consistency and asymptotic normality of our penalized estimators. We demon- 

jrt strate, through a simulation study, that our estimators outperform classical estimators 

Ch for small to moderate sample sizes. Finally an application to the bladder tumour data 

I— —I of Byar (1980) is presented. 
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^ 1. Introduction 

■^ Recurrent events are frequent in clinical or epidemiological studies when each subject 

^^ experiences repeated events over the time. Standard medical examples include the repe- 

^_J tition of asthma attacks, epileptic seizures or tumour recurrences for individual patients. 

• • In this context, proportional hazards models have been largely studied in the literature 

.J^ to model the rate or mean functions of recurrent event data. For instance, Andersen 

^ & Gill (1982) introduce a conditional Cox model where the recurrent events process is 

assumed to be a Poisson process. Without this assumption, similar proportional hazards 
models and extensions are considered in Lawless &: Nadeau (1995), Lin et al. (1998), Lin 
et al. (2000) and Cai k Schaubel (2004). 

To model rate functions in a recurrent events context, a different approach consists in 
fitting a Cox model for any different recurrence. Along these lines. Prentice et al. (1981) 
introduce two stratified proportional hazards models with event-specifics baseline hazards 
and regression coefficients. Gap times and conditional models are presented in their paper 
and a marginal event-specific model is studied in Wei et al. (1989). We refer to Kelly & 
Lim (2000) for a complete review of existing Cox-based recurrent event models. 

Additive models provide an useful alternative to proportional hazards models. For 
classical counting processes, the Aalen model was first introduced in Aalen (1980) and is 
extensively studied in McKeague (1988), Huffer k McKeague (1991), Lin k Ying (1994). 
It is considered in the context of recurrent events in Scheike (2002). We propose in this 
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paper to consider an event-stratified version of the Aalen model, in the manner of Prentice 
et al. (1981). 

As demonstrated in the following, event-stratified models allow more flexibility but 
suffer from over-parametrization as soon as the sample size is not large enough with 
respect to the number of covariates and the number of recurrent events. We address this 
drawback by introducing new estimators defined as minimizers of penalized empirical 
risks. More specifically, we consider a covariate-specific total variation penalty. 

The remainder of this article is organized as follows. The multiplicative and additive 
models studied in this paper are presented in Section 1. In Paragraph 2-4, we describe 
our novel algorithms. It requires preliminary details on inference in these two models, 
which are given in Paragraphs 2-2 and 2-3. Consistency and asymptotics normality of 
the estimators are derived in Section 3. Simulation studies and a real data analysis are 
provided in Sections 4 and 5. A discussion and some concluding remarks are contained 
in Section 6. 



2. Models and algorithm 

2-1. Models 

Let D denote the time of the terminal event and N*{t) the number of recurrent events 

before time t. The end-point of the observation is r > 0. The p-dimensional process of 

covariates is denoted by X and pQ represent the rate function. The event-specific rate 

function of the process N* is then defined as 

E{dN*{t) I X{t),D>t,N*{t) = s-l) = l{D>t)po{t,s,X{t))dt, 

for t in [0, r] and s = 1, . . . ,B. Apart from the stratification, this definition of the rate 
function can be found in Scheike (2002). 

We consider two semiparametric models for the function pQ. The first one is an event- 
specific multiplicative rate model introduced in Prentice et al. (1981). In this model, the 
rate function is specified, for t in [0, r], by 

po{t, s, X{t)) = Qo(t, s) exp (X(t)/3o(s)) (1) 

where for each event number s, l3o{s) is an unknown p-dimensional vector of parameters 
and ao is an unknown baseline function. 

Following Scheike (2002), and Zeng & Cai (2010), we also propose to consider its 
additive counterpart. The rate function in our event-specific additive model is then for t 

in[0,r]: 

Po(t, s, X{t)) = {ao{t, s) + X{t)po{s)) . (2) 

The models, where /?o is constant over the events are refereed to as constant models in 
what follows. 

We consider the problem of estimating the unknown parameter /3o, in stratified models 
(1) and (2) on the basis of data from n independent and identically distributed random 
variables. Introduce the censoring time C. In a random sample of n subjects, the data 
consist of {Ni{t),Ti, 6i, Xi{t),t < t}, i = I, . . . ,n where Ni{t) = N*{tA d), Ti = Di A d 
is the minimum between Di and Cj, 6i = l{Di < Ci) and (Xj(t),0 <t< Ti) is the co- 
variates process. The next assumption characterizes the dependence mechanism between 
the censoring time and the other variables. 
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Assumption 1. For all s = 1, . . . ,B and t in [0, r], 

E{dN*{t) I X{t),DAC>t,N*{t) = s-1) =E{dN*{t) \ X{t),D> t,N*{t) = s-l). 

Note that this assumption is slightly weaker than assuming the independence between 
C and (N* ,D,X). A similar assumption can be found for instance in Lin et al. (2000). 
We also impose the following conditions on the tails of the distribution of T and N. 

Assumption 2. There exists a nonnegative integer B such that 

(i) VtG [0,T],P(iV(t) <B)=1, 
(ii) Vt G [0, r], Vs = 1, . . . , B, P(T > t, N{t) = s - 1 | X{t)) > 0. 

Assumption 2 (i) ensures that in models (1) and (2), the total number of observed events 
is almost surely bounded. It is standard for inference for recurrent events process, see 
e.g. Dauxois & Sencey (2009), Scheike (2002) or Bouaziz et al. (2013). 

Under Assumption 2, the unknown vector of parameters /?o has p x B unknown co- 
efficients to be estimated. For reasonable sizes of sample n, these models are over- 
parametrized in the sense that, when ^/n < p x B, the estimators show very poor be- 
haviours (see Section 4 for an illustration). On the other hand, simpler forms of models (1) 
and (2), in which the unknown parameter does not change with the event, /3o(s) = /3o, 
might be too poor to accurately fit the data (see also Section 4 and the discussion in Kelly 
& Lim (2000)). In this paper, we aim at providing estimators realizing a compromise be- 
tween these two situations. 

In the following, we define, for each individual i, the event-specific at-risk function Yf 
and the overall at-risk function Yi for all t in [0, r]: 

B 

Yf{t) = 1(T, > t,N,{t) = s), Y,{t) = Y,Yf{t) = l{Ti > t). 

s=l 

2-2. Inference in the multiplicative model 
As in Prentice et al. (1981), in the multiplicative event-specific model (1), an estimator 
ji ESI mult of the unknown parameter /3o G W^^ is defined as the maximizer of the partial 
log-hkelihood, or equivalently as 

^ES/mult G argmin L^^' 



pxB 



An estimator f^c/muit i^^ the constant model is defined as 




argmin "-^E / 1 ^.(i)/3(s) " log | J] ^/(t) exp (X,(t)/3(s)) 




l^c/mult e argmin 

/36MP 



n^J ] ^'^^^^ - l°g I E ^^■(*) ^^P (^j(*)/5) I ) Y^{t)dN^{t) 



(4) 



2-3. Inference in the additive model 
As noticed in Martinussen &: Scheike (2009a, b) or Gaiffas & Guilloux (2012), in the 
usual additive hazards model, the estimator PES/add of the unknown parameter /3o G 
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W^ can be written as the niiniinizer of a (partial) least-squares criterion: 

B 

PES/add G argmin L^^^(/3) = argmin V |/3(s)^H„(s)/3(s) - 2/i„(s)/3(s)| , (5) 



/3eRP 



s=l 



where for all s G {1, . . . ,B}, H„(s) are p x p symetrical positive semidefinite matrices 
equal to 



and where hn{s) are p-dimensional vectors equal to 

1 " f 

-Y^j l(iV^(t) = s) {Xi{t) - X\t)yN,{t), 



n 



115 with X^{t) = X;r=i Xi{t)Y^\t)/ Y]t=i ^/(*)- We show in the Appendix why this criterion 
is a relevant strategy in the additive event-specific model. 

On the other hand, an estimator Pc/add iii the constant model is defined as 



^C/add G argmin f /3'^H„/3 - 2/i„/3J , with H„ = ^ H„(s) and fi„ = ^ hn{s). (6) 



s=l s=l 



2-4. A total-variation penalty 
120 To overcome the possible over-parametrization of models (1) and (2), we propose to 
define penalized versions of criteria (3) and (5). For all /3 = (/3(s),s = l,...,i?) with 
P{s) = iP\s), . . .,l3P{s)), define for all j = l,...,p 

B B 

P^ = (/3^-(l), • • • , P'{B)) and Tv{P^) = J] \P\s) -pi{s-l)\=Y, \^P\s)\. (7) 

s=2 s=2 

We now consider the minimizers of the partial log-likelihood (respectively the partial 
125 least-squares) penalized with a covariate specific total variation. Define the penalized 
estimators in models (1) and (2) as: 

f A ^ 1 

Kw/mult e argmin I L^^(/3) + -^^^^W^) > and (8) 



^TV/add G argmin I L^'^'iP) + — J^ Tv(/3^) \ . (9) 



/3eRpxs I n 



i=i 



These penalized algorithms can be rewritten as lasso algorithms (the details are given in 
Supplementary Material) . 



3. Asymptotic results 
We successively provide the asymptotic results for the estimators (3^v/add iii the additive 
model and P-rY/muit iii the multiplicative model. In both models, the following condition 
is mandatory. 

Assumption 3. The covariates process X{-) is of bounded variation on [0, r]. 

Define for all s = 1, . . . , B the centered process M%t) = N{t) - E{N{t) \ X{t),D A C > 
t, N{t) =8—1) and the p x p matrix 



H(s) := fE[Y%t)X{tyX{t)]dt- f 



{E[Y%t)X{t)]r^_^^^ 



which from Assumption 2 (ii) is well defined. 

Theorem 1. Assume that, for each s = 1, . . . ,B, H(s) is non-singular and that 
Asumptions 1, 2 and 3 are fulfilled. 

1. If Xn/n — 7- as n — 7- oo then P-^yjadd converges to (3q in probability. 

2. If Xn/y/n — )■ Aq > as n ^ oo then v^(/5T\7add ~ Po) converges in distribution to 

B 

argmin Kdd{u) = argmin V" \ n(s)^H(s)u(s) - 2u{sY iadd{s) \ 

ueMP neMP ^i 

p B 

+ ^oEE{l^"'(^)|l(^/5'(^) = 0) + sgn{^P^{s)){^u^{s))l{^p\s)^^)] 

i=l s=2 

and for each s, iadd{s) is a centered p- dimensional gaussian vector with covariance matrix us 
equal to 



E 



{X{t) - E[Y'{t)X{t)]/E[Y'{t)])l{N{t) = s)dM\t) ' 





Define for all s = 1, . . . , S and for all t G [0, r], 

s«(s, t, /3) = E[Y'{t)X{t)®^ exp(A(t)/3(s))], / = 0, 1, 2. 

Introduce e(s,i,/?) = s(i)(s, t,/3)/s(o)(s, t, /3), v(s,t,/3) = s^'^\s,t,p)/ s^^\s,t, p) - 
e(s,t,/3)®2 and S(s, /?) = / v(s, t, /3)E[y^(t)dAf(t)]. For any s = l,...,B and for 
any tG [0, t], the three functions s^ '{s,t, (3o) are bounded from Assumption 3 and iso 
e{s,t, f3),v{s,t, (3) and S(s,/3) are finite from Assumptions 2 and 3. 

Theorem 2. Assume that for each s = 1, . . . ,B, l](s,/3o) is non-singular and that 
Assumptions 1, 2 and 3 are fulfilled. 

1. If Xn/n — 7- as n —7- oo then PT^^/mult converges to j3q in probability. 

2. If Xnj \fn — )■ Ao > as n —7- oo then \fn{^l3r^yi^uit — /3o) converges in distribution to 155 

S (^ 

^T^:./ . o ^„,/„^ , „./„^T, 






argmin Amuit{u) = argmin > <^ -u{s) 5](s, t,/3o)ti(s) -\- u{s) S.muit{s] 



p B 

+ AoEE {l^^'(^)|l(^^o(^) = 0) + sgniAPlis)){Au^{s))l{APi{s) + 0)} 



and for each s, ^muit{s) is a centered p- dimensional gaussian vector with covariance 
matrix equal to 

E ( / {X{t)-e{s,t,PQ))Y'{t)dM'{t) 

Theorems 1 and 2 prove the consistency and asymptotic normahty of our estimators (8) 
and (9). This assures that they behave better than the constant estimators when /3o is 
non constant. In addition, the considered penalty will induce sparsity for each covariate 
j = 1, . . . ,p in the successive differences A/3^ (s), s = 1, . . . ,B. As a consequence, the 
effects of a covariate on two consecutive events will often be equal. We show, in the 
following simulation study, that this induced sparsity ameliorates the behaviour of our 
estimators compared to the unconstrained ones (defined in Equations (3) and (5)). 

4. Simulation studies 

We compare the performances of the penalized estimators (8) and (9), the constant 
ones (4) and (6) , and the unconstrained ones (3) and (5). To mimic the bladder tumour 
cancer dataset studied in Section 5, we set p = 4 and consider B = b recurrent events for 
the estimation. In the multiplicative and additive models, the sample size n varies from 
n = 50 = 2-5 p5 to n = 1000 ~ [pEf'^. 

We draw the p = A covariates from uniform distributions and set the parameters values 
at /3i = (0,0,6i,6i,0,...,0), /S^ = (62, • • • ,62), Pi = h^{l.2,?,,. . .) and /J^ = (0, . . . ,0). 
We generate recurrent event times from the multiplicative (1) and additive (2) mod- 
els with baseline defined through the Weibull distribution with shape parameter ayy and 
scale parameter 1. The death and censoring times are generated from exponential distri- 
butions with parameters ao and ac respectively. We set the value of parameter ayy at 2-5. 
Finally, the values of ao and ac are empirically determined to obtain Fobs = 28 — 29% 
and 14 — 15% of individuals experiencing the fifth event. 

To evaluate the performances of the different estimators, we conduct a Monte Carlo 
study with M = 200 experiences. The estimation accuracy is investigated for each method 
via a mean squared rescaled error defined as 



m=l 



'01 



where /3m is the estimation in the sample m. We furthermore study the detection power of 
non-constant (respectively constant) covariate effects by computing mean false positive 
(fp) rates and mean false negative (fp) rates for each method. They are defined, for an 
estimation /3m, as 

FP(/3m) = Card [j G {1, . . . ,p} s.t. Tv(/3^) / and Tv(/3^) = o) (11) 

and 

FN(/3m) = Card [j G {1, . . . ,p} s.t. Tv(/3J) = and Tv(/3^) / o) , (12) 

where TV is defined in (7). 

As expected, the constant model is biased and behave poorly for our choice of a non- 
constant /?o. The comparison between the unconstrained and penalized estimators is 



Table 1. Simulation results in the multiplicative model for Fobs = 28% 



n 


Unconstrained 


Constant 




TV 




two 


-steps TV 




MSE FP FN 


MSE FP FN 


MSE 


FP 


FN 


MSE 


FP FN 


50 


0-100 2 


0-412 2 


0-054 


1-44 


0-03 


0-044 


0-82 0-02 


100 


0-030 2 


0-415 2 


0-025 


1-54 





0-019 


0-76 


500 


0-006 2 


0-413 2 


0-008 


1-76 





0-006 


0-30 


1000 


0-005 2 


0-415 2 


0-006 


1-81 





0-006 


0-05 



MSE: mean squared error, FP: false positives, FN: false negatives. 

Table 2. Simulation results in the multiplicative model for Pobs = 14% 



n 


Unconstrained 


Constant 




TV 




two 


-steps TV 




MSE FP FN 


MSE FP FN 


MSE 


FP 


FN 


MSE 


FP FN 


50 


NA NA NA 


0-440 2 


0-161 


1-37 


0-185 


0-137 


0-82 0-19 


100 


0-566 2 


0-434 2 


0-053 


1-55 


0-005 


0-042 


0-88 


500 


0-014 2 


0-433 2 


0-016 


1-84 





0-012 


1-06 


1000 


0-009 2 


0-433 2 


0-011 


1-89 





0-010 


0-68 



MSE: mean squared error, FP: false positives, FN: false negatives, NA: non applicable . 
Table 3. Simulation results in the additive model for Pots — 28% 



n 


Unconstrained 


Constant 




TV 




two 


-steps TV 




MSE FP FN 


MSE FP FN 


MSE 


FP 


FN 


MSE 


FP FN 


50 


4-986 2 


0-416 2 


0-467 


0-98 


0-58 


1-142 


0-65 0-81 


100 


0-935 2 


0-351 2 


0-254 


1-38 


0-21 


0-353 


0-86 0-48 


500 


0-135 2 


0-309 2 


0-079 


1-91 


0-01 


0-094 


1-44 0-08 


1000 


0-071 2 


0-299 2 


0-049 


1-98 





0-05 


1-64 



MSE: mean squared error, FP: false positives, FN: false negatives 





Table 4. Simulation results in 


the 


additive model for Pobs — 


14% 


n 


Unconstrained 


Constant 




TV 


two 


-steps TV 




MSE FP FN 


MSE FP 


FN 


MSE FP FN 


MSE 


FP FN 


50 


NA NA NA 


0-505 


2 


0-781 0-95 0-81 


2-368 


0-86 0-97 


100 


4-114 2 


0-393 


2 


0-707 1-450 0-27 


0-84 


1-11 0-52 


500 


0-339 2 


0-330 


2 


0-154 1-975 0-01 


0-19 


1-67 0-06 


1000 


0-171 2 


0-320 


2 


0-097 1-995 


0-12 


1-80 0-02 



MSE: mean squared error, FP: false positives, FN: false negatives, NA: non applicable 



in favour of our estimator in all four cases as long as n is smaller than p'^. When the 
percentage of individuals experiencing the fifth event drops, non-constant estimators are 
slightly less accurate. Algorithms are not able to compute all M = 200 unconstrained 
estimators for n = 50. For p = 4, B = 5, n = 100 and Pobs = 14% (which are values close 
to those encountered in the bladder tumour cancer dataset studied in the next section) 
our penalized estimators are respectively 5-8, in the additive model, and 10-6, in the 
multiplicative model, times better than the unconstrained ones in terms of estimation 
error. 

Surprisingly the number of false positives detected by our penalized estimators in- 
creases when the sample size increases. A possible solution to ameliorate the latter is to 
apply the reweighed lasso, or two-steps lasso, as proposed in Candes et al. (2008) (details 
are given in Supplementary Material). We compute the mean squared error, false positive 



and negative rates of the resulting estimator. It shows better false positive rates than the 
first step penalized estimator, greater false negative rates and comparable mean squared 
errors. 

We repeat the simulation study for aw> = 2-5 and then for a Gompertz baseline with 
shape parameter ag = 0-5 (and ag = 0-5) and scale parameter 1. The results are reported 
in Supplementary Material. Conclusions are similar. 



5. Bladder tumour data analysis 

In this section we illustrate the behaviour of our estimators on the bladder tumour 
cancer data of Byar (1980). These data were obtained from a clinical trial conducted by 
the Veterans Administration Co-operative Urological Group. One hundred and sixteen 
patients were randomised to one of three treatments: placebo, pyridoxine or thiotepa. For 
each patient, the time of recurrence tumours were recorded until the death or censoring 
times. The number of recurrences ranges from to 10. On the n = 116 patients, since 13- 
79% experienced at least five tumour recurrences and only 6-9% patients experienced six 
tumour recurrences or more, we set the parameter B to 5. In addition to the two treatment 
variables, pyridoxine and thiotepa, two supplementary covariates were recorded for each 
patient: the number of initial tumours and the size of the largest initial tumour. 

Figure 1 displays the estimations obtained from the constant, unconstrained and total 
variation estimators in the multiplicative model. In order to enforce the variables selection 
performance of the total variation estimator, the coefficients were estimated using the 
reweighed lasso. The unconstrained estimator shows very strong variations and is difficult 
to interpret as such. On the other hand, the constant estimator gives valuable information 
on the impact of each covariate, but in turn cannot detect a change in variation. Our 
total-variation estimator reaches compromise: it is not constant but easily interpretable. 

For instance, a remarkable aspect of the pyrodixine treatment can be highlighted from 
the total variation estimation: this treatment produces a protective effect for the ffist 
three tumour recurrences but the odds of further recurrences are increased by this treat- 
ment. In the same way, an increase in the effect of the initial number of tumours on 
recurrences is observed from the third recurrence. On the opposite, the effects of the 
thiotepa treatment or the size of the largest tumour are shown to be constant in the to- 
tal variation model, the parameter estimates having values similar to the ones obtained 
in the constant model. 

Our conclusions on the treatments effects are in agreement with previous studies on 
bladder tumours recurrences. For instance, no difference in the rate or time to tumour re- 
currence was found from patients using pyrodixine with patients using placebo in Tanaka 
et al. (2011) and Goossens et al. (2012). Moreover, Huang &: Chen (2003) and Sun et al. 
(2006) have respectively studied gap time recurrences in the multiplicative and additive 
models. The results obtained from the former showed a small protective effect of this 
treatment while the latter concluded that gap times did not seem related to pyridoxine. 
These examples illustrate the nice features of our total-variation estimator: it provides 
sharper results, giving relevant informations on covariates effect with respect to the num- 
ber of recurrent events experienced by a subject and it provides the ability to detect a 
change of variation. Further details are provided in Supplementary Material. 



D D D 





Fig. 1. Estimates for the bladder data in the multi- 
plicative model. The crosses represent the constant 
estimator, the filled circles the unconstrained estima- 
tor and the squares the reweighed lasso estimator. 



6. Discussion 

In this paper, the Aalen and Cox models were studied to model the effect of covariates 
on the rate function. However, such models are not essential in our approach. Penalized 
algorithms could be easily derived for other models such as the accelerated failure time 
model or the semiparametric transformation model for instance. 

Although we have only presented asymptotic theoretical results, the simulation studies 
show clear evidence that our estimators outperform standard estimators for small sample 
sizes. Therefore, it would be of great interest to study their finite sample properties. 
However, such results involve deviation inequalities for non i.i.d. and non martingale 
empirical processes. To our knowledge, no such results have yet been established in the 
context of recurrent events. 

Another development of the present paper would be to establish results for the esti- 
mation of change-point locations and the number of change-points. Such results can be 
found for the change-point detection in the mean of a gaussian signal in Harchaoui & 
Levy-Leduc (2010), for instance. 



Appendix: Proofs 
Proofs of Lemma 1 to 3 arc in Supplementary Material. 
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A key relation 
265 Lemma A1. Under Assumption 1, for all i = 1, . . . ,n 

E{dN,{t) I X,(t), A A C, > t,N,{t) = s-l)= Y,{t)po{t,s,X,{t))dt. 

Decomposition of the least squares criterion in the additive model 
The next proposition gives the details of the construction of the partial least squares in the 
additive model. One has to notice that the processes Zn{s) introduced below are centered which 
implies that finding a minimizer of L^^^ is a natural way of estimating /3o in model (2). 

270 Lemma A2. In the additive event- specific model (2), the partial least squares criterion (5) 
can be rewritten as 

B 

L^'-'iP) = E {m'^tin{s)m - 2/3(s)TH„(s)/3o(s) - 2Z„(s)/3(s)} , (Al) 

s=l 

where 

^ n B „ 

^«(^) ^ - E E / {^^(*) - ^^ W}1(^. W - s)dMnt). 

i=l s=l •' 

A technical lemma 
275 Lemma A3. Let I?[0,t] denotes the set of cddlag functions on [0,t] and let i^„(-) and 
f{T,5,X{-),N{-)) be two random processes of bounded variation on [0,t]. Suppose that for all 

ze[0,r], 



E 
We then have the following properties: 



f{T,6,X{t),N(t))dM'{t)' 





< oo. 



280 (i) If f{T,5, X{-), N{-)) is a random variable of bounded variation on [0,t], then 

^E/ f{T^,S,,X,{t),N,{t))dM:it) 
converges weakly in T>[0,t] to a centered gaussian process with variance equal to 



E 



f{T,S,X{t),N{t))dAP{t) 
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(a) //supjgFQ ^1 \Fn{t) — F(t)\ — op(l), where F(-) is a random process on [0,t], then 

sup J^E / (F^it) - F{t))f{T,,S,,X,{t),N,{t))dM:{t)\ ^ op{l). 

Proof of Theorem 1 
Proof of 1. Let r'^'^'^{f3) be the quantity minimized by fiTv/add a-nd introduce TaddiP) = 
^«= Ef=i [/3(s)^H(s)/3(s) - 2/i(s)/3(s)] where 

his) := I E [liNit) = s)Xit)dN{t)] J ^^|p^E[l(^W = ^)dNit)]. 

Using Lemma Al notice that h(s) — /3o(s) H(s) and consequently, argminoFadd = /^o- Since the 
criterion to minimize is convex, the convergence in probability of /^xv/add to /3o follows from the 
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pointwise convergence of r„ (/3) towards Taddil^)- Now write: 

rr(/3) - r„,,(/3) < L^^^-'{I3) - T{f3) + ^i?pmax|/3^» - fi^s - 1)| 

n S.J 

< Bp2max|/3^(s)/3'^(s)(H^^'=(s) - W-''is))\ + 2Bpmax\hi{s) - h^s)\\P^s)\ + —Bp 

j,k,s j.s n 

and the result follows from the law of large number and the fact that A„/n — > as n tends to 
infinity. 

Proof of 2. Define 

B B p 

Af ""(w) = ^7i(s)TH„(s)w(5) - 2VnJ2 Zn{s)u{s) + A„ £ (tv(/3^ + u^ /Vn) - Tv(/3^)) 

8=1 S=l j — 1 

and notice that A"'''^(m) is minimum at m = y/nif^Tv/add ^ M- Write 

B ^ n ,.T B 

I in. I V "in A irn \ 

M(s)l(iV,(t) = s)dMl{t) 






n „r -B 



^grgi-.o-^ 



E[r*(t)x(t)] 



u(s)l(A^j(t) = s)dMl{t). 



Let F„(i) =^^(X''(t)-E[r''(i)X(t)]/E[y''(i)])u(s) and i^(t) = 0. F„ has bounded varia- 
tion and from Lemma A3 (ii), the second term converges to in probability. Now, take 
f{T,,Si,X,{t),N,{t)) = J2si^^W-'^['^'it)^(t)]/^[Y'it)])u{s)l{N,{t) = s) which is also a func- 300 
tion of bounded variation. From Lemma A3 (i), the first term converges weakly towards a centered 
gaussian variable with variance equal to 



E 



J2iXit)-E[Y'{t)X{t)]/E[Y%t)])u{s)l{N{t) = s)dM'{t) 

s=l 

Y.u{s)'^E \( f {x{t)-ny'{t)x{t)]/w{t)])i{N{t) ^ s)dM^{t) 



u{s). 



Then, note that '^g^iu{s)^'iii.n{s)u{s) converges to X]s=i '"(*)^H(s)u(s), in probability and 
^n J2j (tv(/3^ + wV\/") - Tv(/3^) j /Ao converges to 

p B 

^^{|A«^(s)|l(A/?^(s) = 0) + sgn(A/3^5(s))(A^.■'■(5))l(A/3■'"(s) ^0)}. 



Thus A^f'^{u) converges to Aadd{u) in distribution. Since A^'^'^ is convex and Aadd has a unique 
minimum, it follows that v^l/^Tv/add ^ M converges to argmin„Aac(d(M) in distribution. 

Proof of Theorem 2 
First define for ^ = 0, 1 or 2 

n 

n ^ — ' 

Following the arguments in example Vn.2.7 page 502 of Andersen et al. (1993), it can easily be 
shown that 

sup \S^^\s,t,f3o)^s^'\s,t,/3o)\ A 0,V/ = 0,1,2, 
te[o,r] "^°° 
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using the fact that the covariates process is of bounded variation (in particular, this assumption 
315 guarantees that s*-''(s,i,/3o) has a countable number of jumps). 

Proof of 1. Let r™"'*(/3) be the quantity minimized by ^TV/mult and introduce 

r™„,*(/3) = -5] /E[x(t)/3(5)y*(t)div(t)] + ^ /iog(s(o)(s,t,/3))E[y*(t)div(i)] 

B ,. 

= -J2 Mt, s) (/3(5)^s(i) (5, t, /3o) - log(s(°) (s, t, /3))s(o) is, t, /3o)) dt, 

3 = 1-' 

where the last equality follows from Lemma Al. From similar arguments as in proof 1. of The- 
320 orem 1 and the uniform convergence with respect to t of Sn {s,t,(3o) towards s^'^'{s,t, Po), we 
get the pointwise convergence in probability of r™"'*(/3) to TmuitiP)- Then, the consistency of 
$Tv/muit follows from the convexity of r™"'*(/3) and the fact that argmin^ ^muit{(3) = Po- 
Proof of 2. Consider the convex function 

p 
Ar'*(^) = nr„(/3o + u/V^) - nr„(/3o) + A„ ^^ (tv(/3^ + u^/V^) - Tv(/3^)) 



which is minimum at u = v^C/^Tv/muft ^ Pa)- Then from a Taylor expansion, one gets 
32= Ar'*(«) --— EE / (Mt)-'En{s,t,M)Yf{t)dN,{t)u{s) 



-. B 71 ^ P 

+ ^E"(*)^E / V"(^'*'^o)^^'(*)'^^'(*)"W + ^"E(T^('^o+«vv^)-Tv(/3ij)) +op(i), 

where 

E„(s,t,/3) = 4;^|^^lM), v„(.,i,/3) = ^||M^ „ E„(.,i,/3r. 

The uniform convergence with respect to t of Sn {s,t,(3) and 5„ (s,i,/3) towards s'^"^(s, t, /3o) 
330 and s"'(s, i, /3o) respectively and the law of large number give the convergence in probability of 
the term 



towards 



Notice that 



i B n ^ 

^E"(^)^E / v„(5,t,/3o)r/Wdiv,(t)«(5) 

^ — 1 A 1 "^ 



\ E "W^ / ^(^' *' /5o)E[r^(t)d7V(t)]u(s). 



E (A,(t) -E„(5,t,/3o))r/(t)ao(i,s)exp(A(i)/3o(t))cit = 



in order to rewrite the first term of A™"'*(u) as 

B n 



J-» It n 

EEy (^*w-E„(s,i,/?o))w(s)y/(t)dM/(o. 



s=l i=l 
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From Lemma 3, the same kind of arguments as in the proof of Theorem 1 can be apphed to 
conclude the proof. 
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Supplementary material 

Supplementary material includes a description of the algorithms, extended simulation study 
and additional analysis on the bladder tumour data of Byar (1980). It also contains proofs of 
Proposition 2 and Lemma 3. 



